Superspeculative Microarchitecture for Beyond AD 2000
نویسندگان
چکیده
I n its brief lifetime of 26 years, the microprocessor has achieved a total performance growth of 10,000 times thanks to technology improvements and microarchitecture innovations. Transistor count and clock frequency have increased by an order of magnitude in each of the first two decades of microprocessors; transistor count increased from 10,000 to 100,000 in the 1970s and up to 1 million in the 1980s, while clock frequency increased from 200 KHz to 2 MHz in the 1970s and up to 20 MHz in the 1980s. This incredible technology trend has continued: Since 1990, both transistor count and clock frequency have already achieved an increase of 20 to 30 times. During the 1980s, sustained instructions per cycle also increased by almost an order of magnitude, from roughly 0.1 to 0.9. IPC is a measure of the instruction-level parallelism or instruction throughput achieved by the concurrent processing of multiple machine instructions. In the 1990s, IPC improvement is struggling and may not triple by 1999. New microarchitecture innovations are needed. Current top-of-the-line microprocessors are four-instruction-wide superscalar machines; that is, they can fetch and complete up to four instructions in a single machine cycle. Such machines use pipelined functional units, aggressive branch prediction, dynamic register renaming, and out-of-order execution of instructions to maximize parallelism and tolerate memory latency. State-of-the-art processors include the Digital Equipment Alpha 21264, Silicon Graphics MIPS/R10000, IBM/Motorola PowerPC 604, and Intel Pentium Pro. Even with such elaborate microarchitectures, against a potential 4 IPC, these machines typically achieve only about 0.5 to 1.5 sustained IPC for real-world programs. Worse yet, most studies indicate that machine efficiency drops even lower as we extrapolate to wider machines. One recent study indicated that although a hypothetical 2-instruction-wide machine achieves IPC in the range of 0.65 to 1.40, a similar, hypothetical, 6-instruction-wide machine will achieve only 1.2 to 2.3 IPC. 1 Such data imply that the current superscalar paradigm is running into rapidly diminishing returns on performance. Future billion-transistor chips will inevitably implement machines that are much wider (issue more than four instructions at once) and deeper (have longer pipelines). The question is, how do we harvest additional parallelism proportional to increased machine resources? Several approaches have vocal advocates, each with valid reasons; they are • reconfigurable parallel computing engines; • specialized, very long instruction word (VLIW) machines; • wide, simultaneous multithreaded (SMT) uni-processors; • single-chip multiprocessors (CMP); • memory-centric computing engines (such as IRAM); …
منابع مشابه
A survey of new research directions in microprocessors
Current microprocessors utilise the instruction-level parallelism by a deep processor pipeline and the superscalar instruction issue technique. VLSI technology offers several solutions for aggressive exploitation of the instruction-level parallelism in future generations of microprocessors. Technological advances will replace the gate delay by on-chip wire delay as the main obstacle to increase...
متن کاملExecution Performance of the Scheduled Dataflow Architecture (SDF)
This paper presents an evaluation of a nonblocking, decoupled memory/execution, multithreaded architecture known as the Scheduled Dataflow (SDF). Recent focus in the field of new processor architectures is mainly on VLIW (e.g. IA-64), superscalar and superspeculative designs. This trend allows for better performance at the expense of increased hardware complexity, and possibly higher power expe...
متن کاملHydrodynamic Models for Heavy-Ion Collisions, and beyond
A generic property of a first-order phase transition in equilibrium, and in the limit of large entropy per unit of conserved charge, is the smallness of the isentropic speed of sound in the “mixed phase”. A specific prediction is that this should lead to a non-isotropic momentum distribution of nucleons in the reaction plane (for energies ∼ 40A GeV in our model calculation). On the other hand, ...
متن کاملTEAPC: Adaptive Computing and Underclocking in a Real PC
TEAPC is an IBM/Intel-standard PC realization of the TEAtime performance “maximizing” adaptive computing algorithm, giving performance beyond worstcase-specifications. TEAPC goes beyond the TEAtime algorithm by adapting to the current CPU load. It is also the first machine to use extensive underclocking for disaster tolerance, low power consumption and high reliability. This is all done dynamic...
متن کاملPackings and Approximate Packings of Spheres
Close-packings of uniformly-sized spheres with centres on various lattices are described, with volume fractions equal or close to the maximum possible = p 18 (this value has long been `known' via Kepler's conjecture, and has been proved). Regular packings with two or three sized spheres can push this volume fraction to beyond 80%. The bulk of the paper studies irregular `packings' of a large sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Computer
دوره 30 شماره
صفحات -
تاریخ انتشار 1997